Minimax Estimation in Sparse Canonical Correlation Analysis
نویسندگان
چکیده
Canonical correlation analysis is a widely used multivariate statistical technique for exploring the relation between two sets of variables. This paper considers the problem of estimating the leading canonical correlation directions in high dimensional settings. Recently, under the assumption that the leading canonical correlation directions are sparse, various procedures have been proposed for many high dimensional applications involving massive data sets. However, there has been few theoretical justification available in the literature. In this paper, we establish rate-optimal non-asymptotic minimax estimation with respect to an appropriate loss function for a wide range of model spaces. Two interesting phenomena are observed. First, the minimax rates are not affected by the presence of nuisance parameters, namely the covariance matrices of the two sets of random variables, though they need to be estimated in the canonical correlation analysis problem. Second, we allow the presence of the residual canonical correlation directions. However, they do not influence the minimax rates under a mild condition on eigengap. A generalized sin-theta theorem and an empirical process bound for Gaussian quadratic forms under rank constraint are used to establish the minimax upper bounds, which may be of independent interest.
منابع مشابه
Sparse CCA: Adaptive Estimation and Computational Barriers
Canonical correlation analysis (CCA) is a classical and important multivariate technique for exploring the relationship between two sets of variables. It has applications in many fields including genomics and imaging, to extract meaningful features as well as to use the features for subsequent analysis. This paper considers adaptive and computationally tractable estimation of leading sparse can...
متن کاملAn Efficient and Optimal Method for Sparse Canonical Correlation Analysis
Canonical correlation analysis (CCA) is an important multivariate technique for exploring the relationship between two sets of variables which finds applications in many fields. This paper considers the problem of estimating the subspaces spanned by sparse leading canonical correlation directions when the ambient dimensions are high. We propose a computationally efficient two-stage estimation p...
متن کاملSupplement to “minimax Estimation in Sparse Canonical Correlation Analysis”
In this appendix, we prove Theorem 4 and Lemmas 7 – 12 in order. A.1. Proof of Theorem 4. We first need a lemma for perturbation bound of square root matrices. Lemma 16. Let A, B be positive semi-definite matrices, and then for any unitarily invariant norm ï¿¿·ï¿¿, ï¿¿A 1/2 − B 1/2 ï¿¿ ≤ 1 σ min (A 1/2) + σ min (B 1/2) ï¿¿A − Bï¿¿. Proof. The proof essentially follows the idea of [27]. Let D = ...
متن کاملInference for high-dimensional differential correlation matrices
Motivated by differential co-expression analysis in genomics, we consider in this paper estimation and testing of high-dimensional differential correlation matrices. An adaptive thresholding procedure is introduced and theoretical guarantees are given. Minimax rate of convergence is established and the proposed estimator is shown to be adaptively rate-optimal over collections of paired correlat...
متن کاملMethods of Canonical Analysis for Functional Data
We consider estimates for functional canonical correlations and canonical weight functions. Four computational methods for the estimation of functional canonical correlation and canonical weight functions are proposed and compared, including one which is a slight variation of the spline method proposed by Leurgans, Moyeed and Silverman (1993). We propose dimension reduction and dimension augmen...
متن کامل